针对现有微表情识别方法存在多尺度特征提取能力不足、区域协同关系建模不充分及计算复杂度较高等缺点,提出结合空间通道特征与图注意力的分层Transformer微表情识别方法(Hierarchical Transformer for Micro-Expression Recognition with Spatial-Channel Features and Graph Attention, HT-SCGA).首先,设计多尺度动态窗口模块,通过自适应窗口扩展实现从局部到全局的特征层次化提取.然后,设计双域特征关联模块,在空间维度与通道维度建模细粒度依赖关系,有效提升特征表达能力并降低计算复杂度.最后,构建图注意力聚合模块,显式建模面部关键区域间的语义依赖,增强面部动作单元的联动特征.在多个数据集上的实验表明,HT-SCGA性能较优,由此表明其在微表情识别任务中的有效性与高效性.
The existing approaches possess limitations of multi-scale feature extraction, inter-region relationship modeling, and computational efficiency. To address these issues, a hierarchical Transformer for micro-expression recognition with spatial-channel features and graph attention(HT-SCGA) is proposed. First, a multi-scale dynamic window module is designed for hierarchical extraction from local fine-grained features to global coarse-grained features through adaptively expanding receptive fields. Second, a dual-domain feature association module is introduced to enhance feature representation and reduce computational complexity by jointly modelling spatial and channel dependencies. Finally, a graph attention aggregation module is constructed to explicitly model semantic correlations among key facial regions and strengthen the coordinated representation of facial action units. Experiments on three benchmark datasets, SMIC, CASME II, and SAMM, demonstrate that HT-SCGA outperforms existing methods on UF1 and UAR metrics. These results verify the effectiveness and efficiency of HT-SCGA for micro-expression recognition.
曹春萍, 魏金鑫. 结合空间通道特征与图注意力的分层Transformer微表情识别方法[J]. 模式识别与人工智能, 2025, 38(11): 1027-1040.
CAO Chunping, WEI Jinxin. Hierarchical Transformer for Micro-Expression Recognition with Spatial-Channel Features and Graph Attention. Pattern Recognition and Artificial Intelligence, 2025, 38(11): 1027-1040.
[1] LI Y T, WEI J S, LIU Y, et al. Deep Learning for Micro-Expre-ssion Recognition: A Survey. IEEE Transactions on Affective Computing, 2022, 13(4): 2028-2046.
[2] DATZ F, WONG G, LÖFFLER-STASTKA H. Interpretation and Wor-king through Contemptuous Facial Micro-Expressions Benefits the Patient-Therapist Relationship. International Journal of Environmental Research and Public Health, 2019, 16(24). DOI: 10.3390/ijerph16244901.
[3] HURLEY C M, ANKER A E, FRANK M G, et al. Background Fa-ctors Predicting Accuracy and Improvement in Micro Expression Recognition. Motivation and Emotion, 2014, 38: 700-714.
[4] HONG J, LEE C, JUNG H.Late Fusion-Based Video Transformer for Facial Micro-Expression Recognition. Applied Sciences, 2022, 12(3). DOI: 10.3390/app12031169.
[5] ZHU J, ZONG Y, CHANG H L, et al. A Sparse-Based Transformer Network with Associated Spatiotemporal Feature for Micro-Expre-ssion Recognition. IEEE Signal Processing Letters, 2022, 29: 2073-2077.
[6] LEI L, CHEN T, LI S G, et al. Micro-Expression Recognition Based on Facial Graph Representation Learning and Facial Action Unit Fusion // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2021: 1571-1580.
[7] LIANG J Y, CAO J Z, SUN G L, et al. SwinIR: Image Restoration Using Swin Transformer // Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2021: 1833-1844.
[8] LI H T, SUI M Z, ZHU Z Q, et al. MMNet: Muscle Motion-Guided Network for Micro-Expression Recognition // Proc of the 31st International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2022: 1074-1080.
[9] ZHOU L, MAO Q R, HUANG X H, et al. Feature Refinement: An Expression-Specific Feature Learning and Fusion Method for Micro-Expression Recognition. Pattern Recognition, 2022, 122. DOI: 10.1016/j.patcog.2021.108275.
[10] WANG Z F, ZHANG K H, LUO W H, et al. HTNet for Micro-Ex-pression Recognition. Neurocomputing, 2024, 602. DOI: 10.1016/j.neucom.2024.128196.
[11] CORDONNIER J, LOUKAS A, JAGGI M. Multi-head Attention: Collaborate Instead of Concatenate[C/OL]. [2025-08-19]. https://arxiv.org/pdf/2006.16362.
[12] ZHAO G Y, PIETIKAINEN M.Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 915-928.
[13] LIONG S, SEE J, WONG K, et al. Less Is More: Micro-Expre-ssion Recognition from Video Using Apex Frame. Signal Processing(Image Communication), 2018, 62: 82-92.
[14] LIU Y J, ZHANG J K, YAN W J, et al. A Main Directional Mean Optical Flow Feature for Spontaneous Micro-Expression Recognition. IEEE Transactions on Affective Computing, 2016, 7(4): 299-310.
[15] ZHANG L F, HONG X P, ARANDJELOVI? O, et al. Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition. IEEE Transactions on Affective Computing, 2022, 13(4): 1973-1985.
[16] LO L, XIE H X, SHUAI H H, et al. MER-GCN: Micro-Expre-ssion Recognition Based on Relation Modeling with Graph Convolutional Networks // Proc of the IEEE Conference on Multimedia Information Processing and Retrieval. Washington, USA: IEEE, 2020: 79-84.
[17] ZHAO X H, MA H M, WANG R Q.STA-GCN: Spatio-Temporal AU Graph Convolution Network for Facial Micro-Expression Recognition // Proc of the Chinese Conference on Pattern Recognition and Computer Vision. Berlin, Germany: Springer, 2021: 80-91.
[18] ZHANG L J, ZHANG Y F, SUN X Z, et al. Micro-Expression Recognition Based on Direct Learning of Graph Structure. Neurocomputing, 2025, 619. DOI: 10.1016/j.neucom.2024.129135.
[19] KUMAR A J R, BHANU B. Micro-Expression Classification Based on Landmark Relations with Graph Attention Convolutional Network // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition Workshops. Washington, USA: IEEE, 2021: 1511-1520.
[20] ZHAI Z J, ZHAO J H, LONG C J, et al. Feature Representation Learning with Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 22086-22095.
[21] HU J, SHEN L, SUN G. Squeeze-and-Excitation Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7132-7141.
[22] WOO S, PARK J, LEE J, et al. CBAM: Convolutional Block Attention Module // Proc of the 15th European Conference on Com-puter Vision. Berlin, Germany: Springer, 2018: 3-19.
[23] WANG W H, XIE E Z, LI X, et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 548-558.
[24] CAI H, LI J Y, HU M Y, et al. EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction // Proc of the IEEE/CVF International Conference on Computer Vision. Wa-shington, USA: IEEE, 2023: 17256-17267.
[25] LI X B, PFISTER T, HUANG X H, et al. A Spontaneous Micro-Expression Database: Inducement, Collection and Baseline // Proc of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Washington, USA: IEEE, 2013. DOI: 10.1109/FG.2013.6553717.
[26] YAN W J, LI X B, WANG S J, et al. CASME II: An Improved Spontaneous Micro-Expression Database and the Baseline Evaluation. PLoS One, 2014, 9(1). DOI: 10.1371/journal.pone.0086041.
[27] DAVISON A K, LANSLEY C, COSTEN N, et al. SAMM: A Spon-taneous Micro-Facial Movement Dataset. IEEE Transactions on Affective Computing, 2018, 9(1): 116-129.
[28] LIONG S, SEE J, WONG K, et al. Automatic Apex Frame Spo-tting in Micro-Expression Database // Proc of the 3rd IAPR Asian Conference on Pattern Recognition. Washington, USA: IEEE, 2015: 665-669.
[29] JOSE E, GREESHMA M, HARIDAS M T P, et al. Face Recognition Based Surveillance System Using FaceNet and MTCNN on Jetson TX2 // Proc of the 5th International Conference on Advanced Computing and Communication Systems. Washington, USA: IEEE, 2019: 608-613.
[30] FU C H, YANG W Z, CHEN D, et al. AM3F-FlowNet: Atten-tion-Based Multi-scale Multi-branch Flow Network. Entropy, 2023, 25(7). DOI: 10.3390/e25071064.
[31] GAN Y S, LIEN S, CHIANG Y, et al. LAENet for Micro-Expre-ssion Recognition. The Visual Computer, 2024, 40(2): 585-599.
[32] CAI W H, ZHAO J L, YI R, et al. MFDAN: Multi-level Flow-Driven Attention Network for Micro-Expression Recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(12): 12823-12836.
[33] LIONG S, GAN Y S, SEE J, et al. Shallow Triple Stream Three-Dimensional CNN(STSTNet) for Micro-Expression Recognition // Proc of the 14th IEEE International Conference on Automatic Face and Gesture Recognition. Washington, USA: IEEE, 2019. DOI: 10.1109/FG.2019.8756567.
[34] GEORGE D, LEHRACH W, KANSKY K, et al. A Generative Vision Model That Trains with High Data Efficiency and Breaks Text-Based CAPTCHAs. Science, 2017, 358(6368). DOI: 10.1126/science.aag26.